Predicting local malaria exposure using a Lasso-based two-level cross validation algorithm

نویسندگان

  • Bienvenue Kouwaye
  • Fabrice Rossi
  • Noël Fonton
  • André Garcia
  • Simplice Dossou-Gbété
  • Mahouton Norbert Hounkonnou
  • Gilles Cottrell
چکیده

Recent studies have highlighted the importance of local environmental factors to determine the fine-scale heterogeneity of malaria transmission and exposure to the vector. In this work, we compare a classical GLM model with backward selection with different versions of an automatic LASSO-based algorithm with 2-level cross-validation aiming to build a predictive model of the space and time dependent individual exposure to the malaria vector, using entomological and environmental data from a cohort study in Benin. Although the GLM can outperform the LASSO model with appropriate engineering, the best model in terms of predictive power was found to be the LASSO-based model. Our approach can be adapted to different topics and may therefore be helpful to address prediction issues in other health sciences domains.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Regression Trees and Random forest based feature selection for malaria risk exposure prediction

This paper deals with prediction of anopheles number, the main vector of malaria risk, using environmental and climate variables. The variables selection is based on an automatic machine learning method using regression trees, and random forests combined with stratified two levels cross validation. The minimum threshold of variables importance is accessed using the quadratic distance of variabl...

متن کامل

Coordinate Descent Algorithms for Lasso Penalized Regression

Imposition of a lasso penalty shrinks parameter estimates toward zero and performs continuous model selection. Lasso penalized regression is capable of handling linear regression problems where the number of predictors far exceeds the number of cases. This paper tests two exceptionally fast algorithms for estimating regression coefficients with a lasso penalty. The previously known ℓ2 algorithm...

متن کامل

Predicting The Type of Malaria Using Classification and Regression Decision Trees

Predicting The Type of Malaria Using Classification and Regression Decision Trees Maryam Ashoori1 *, Fatemeh Hamzavi2 1School of Technical and Engineering, Higher Educational Complex of Saravan, Saravan, Iran 2School of Agriculture, Higher Educational Complex of Saravan, Saravan, Iran Abstract Background: Malaria is an infectious disease infecting 200 - 300 million people annually. Environme...

متن کامل

Development and Validation of Predictive Indices for a Continuous Outcome Using Gene Expression Profiles

There have been relatively few publications using linear regression models to predict a continuous response based on microarray expression profiles. Standard linear regression methods are problematic when the number of predictor variables exceeds the number of cases. We have evaluated three linear regression algorithms that can be used for the prediction of a continuous response based on high d...

متن کامل

Comparison of Genomic Selection Models to Predict Flowering Time and Spike Grain Number in Two Hexaploid Wheat Doubled Haploid Populations

Genomic selection (GS) is becoming an important selection tool in crop breeding. In this study, we compared the ability of different GS models to predict time to young microspore (TYM), a flowering time-related trait, spike grain number under control conditions (SGNC) and spike grain number under osmotic stress conditions (SGNO) in two wheat biparental doubled haploid populations with unrelated...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 12  شماره 

صفحات  -

تاریخ انتشار 2017